-
Notifications
You must be signed in to change notification settings - Fork 145
netdev CI testing #6666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kuba-moo
wants to merge
470
commits into
kernel-patches:bpf-next_base
Choose a base branch
from
linux-netdev:to-test
base: bpf-next_base
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
netdev CI testing #6666
+17,175
−4,599
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4f22ee0
to
8a9a8e0
Compare
64c403f
to
8da1f58
Compare
78ebb17
to
9325308
Compare
c8c7b2f
to
a71aae6
Compare
9325308
to
7940ae1
Compare
d8feb00
to
b16a6b9
Compare
7940ae1
to
8f1ff3c
Compare
4164329
to
c5cecb3
Compare
A number of dwmac variants from Rockchip SoCs have turned up in the Rockchip-specific binding, but not in the main list in snps,dwmac.yaml which as the comment indicates is needed for accurate matching. So add the missing rk3528, rk3568 and rv1126 to the main list. Reviewed-by: Andrew Lunn <[email protected]> Acked-by: Conor Dooley <[email protected]> Signed-off-by: Heiko Stuebner <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Rockchip RK3506 has two Ethernet controllers based on Synopsys DWC Ethernet QoS IP. Add compatible string for the RK3506 variant. Reviewed-by: Andrew Lunn <[email protected]> Acked-by: Conor Dooley <[email protected]> Signed-off-by: Heiko Stuebner <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Add the needed glue blocks for the RK3506-specific setup. The RK3506 dwmac only supports up to 100MBit with a RMII PHY, but no RGMII. Signed-off-by: David Wu <[email protected]> Signed-off-by: Heiko Stuebner <[email protected]> Signed-off-by: NipaLocal <nipa@local>
The dwmac-rk glue driver is currently not caught by the general maintainer entry for Rockchip SoCs, so add it explicitly, similar to the i2c driver. The binding document in net/rockchip-dwmac.yaml already gets caught by the wildcard match. Signed-off-by: Heiko Stuebner <[email protected]> Signed-off-by: NipaLocal <nipa@local>
When the server has MPTCP enabled but receives a non-MP-capable request from a client, it calls mptcp_fallback_tcp_ops(). Since non-MPTCP connections are allowed to use sockmap, which replaces sk->sk_prot, using sk->sk_prot to determine the IP version in mptcp_fallback_tcp_ops() becomes unreliable. This can lead to assigning incorrect ops to sk->sk_socket->ops. Additionally, when BPF Sockmap modifies the protocol handlers, the original WARN_ON_ONCE(sk->sk_prot != &tcp_prot) check would falsely trigger warnings. Fix this by using the more stable sk_family to distinguish between IPv4 and IPv6 connections, ensuring correct fallback protocol operations are selected even when BPF Sockmap has modified the socket protocol handlers. Fixes: 0b4f33d ("mptcp: fix tcp fallback crash") Cc: <[email protected]> Signed-off-by: Jiayuan Chen <[email protected]> Reviewed-by: Jakub Sitnicki <[email protected]> Signed-off-by: NipaLocal <nipa@local>
MPTCP creates subflows for data transmission, and these sockets should not be added to sockmap because MPTCP sets specialized data_ready handlers that would be overridden by sockmap. Additionally, for the parent socket of MPTCP subflows (plain TCP socket), MPTCP sk requires specific protocol handling that conflicts with sockmap's operation(mptcp_prot). This patch adds proper checks to reject MPTCP subflows and their parent sockets from being added to sockmap, while preserving compatibility with reuseport functionality for listening MPTCP sockets. We cannot add this logic to sock_map_sk_state_allowed() because the sockops path doesn't execute this function, and the socket state coming from sockops might be in states like SYN_RECV. So moving sock_map_sk_state_allowed() to sock_{map,hash}_update_common() is not appropriate. Instead, we introduce a new function to handle MPTCP checks. Fixes: 0b4f33d ("mptcp: fix tcp fallback crash") Cc: <[email protected]> Signed-off-by: Jiayuan Chen <[email protected]> Suggested-by: Jakub Sitnicki <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Add test cases to verify that when MPTCP falls back to plain TCP sockets, they can properly work with sockmap. Additionally, add test cases to ensure that sockmap correctly rejects MPTCP sockets as expected. Signed-off-by: Jiayuan Chen <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Currently, in hclge_mii_ioctl(), the operation to read the PHY register (SIOCGMIIREG) always returns 0. This patch changes the return type of hclge_read_phy_reg(), returning an error code when the function fails. Fixes: 024712f ("net: hns3: add ioctl support for imp-controlled PHYs") Signed-off-by: Jijie Shao <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Currently, when debugfs and reset are executed concurrently, some resources are released during the reset process, which may cause debugfs to read null pointers or other anomalies. Therefore, in this patch, interception protection has been added to debugfs operations that are sensitive to reset. Fixes: eced3d1 ("net: hns3: use seq_file for files in queue/ in debugfs") Signed-off-by: Jijie Shao <[email protected]> Signed-off-by: NipaLocal <nipa@local>
In efx_mae_enumerate_mports(), memory allocated for mae_mport_desc is passed as a argument to efx_mae_process_mport(), but when the error path in efx_mae_process_mport() gets executed, the memory allocated for desc gets leaked. Fix that by freeing the memory allocation before returning error. Fixes: a6a15ac ("sfc: enumerate mports in ef100") Acked-by: Edward Cree <[email protected]> Signed-off-by: Abdun Nihaal <[email protected]> Signed-off-by: NipaLocal <nipa@local>
The changes introduced in commit dc82a33 ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops") have been found to cause a race condition in production environments. Under specific circumstances, observed exclusively on ARM64 (aarch64) systems with Ampere Altra Max CPUs, a transmit queue (TXQ) can become permanently stalled. This happens when the race condition leads to the TXQ entering the QUEUE_STATE_DRV_XOFF state without a corresponding queue wake-up, preventing the attached qdisc from dequeueing packets and causing the network link to halt. As a first step towards resolving this issue, this patch introduces a failsafe mechanism. It enables the net device watchdog by setting a timeout value and implements the .ndo_tx_timeout callback. If a TXQ stalls, the watchdog will trigger the veth_tx_timeout() function, which logs a warning and calls netif_tx_wake_queue() to unstall the queue and allow traffic to resume. The log message will look like this: veth42: NETDEV WATCHDOG: CPU: 34: transmit queue 0 timed out 5393 ms veth42: veth backpressure stalled(n:1) TXQ(0) re-enable This provides a necessary recovery mechanism while the underlying race condition is investigated further. Subsequent patches will address the root cause and add more robust state handling in ndo_open/ndo_stop. Fixes: dc82a33 ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops") Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: NipaLocal <nipa@local>
The veth driver started manipulating TXQ states in commit dc82a33 ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops"). Other drivers manipulating TXQ states takes care of stopping and starting TXQs in NDOs. Thus, adding this to veth .ndo_open and .ndo_stop. Fixes: dc82a33 ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops") Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Commit dc82a33 ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops") introduced a race condition that can lead to a permanently stalled TXQ. This was observed in production on ARM64 systems (Ampere Altra Max). The race occurs in veth_xmit(). The producer observes a full ptr_ring and stops the queue (netif_tx_stop_queue()). The subsequent conditional logic, intended to re-wake the queue if the consumer had just emptied it (if (__ptr_ring_empty(...)) netif_tx_wake_queue()), can fail. This leads to a "lost wakeup" where the TXQ remains stopped (QUEUE_STATE_DRV_XOFF) and traffic halts. This failure is caused by an incorrect use of the __ptr_ring_empty() API from the producer side. As noted in kernel comments, this check is not guaranteed to be correct if a consumer is operating on another CPU. The empty test is based on ptr_ring->consumer_head, making it reliable only for the consumer. Using this check from the producer side is fundamentally racy. This patch fixes the race by adopting the more robust logic from an earlier version V4 of the patchset, which always flushed the peer: (1) In veth_xmit(), the racy conditional wake-up logic and its memory barrier are removed. Instead, after stopping the queue, we unconditionally call __veth_xdp_flush(rq). This guarantees that the NAPI consumer is scheduled, making it solely responsible for re-waking the TXQ. (2) On the consumer side, the logic for waking the peer TXQ is centralized. It is moved out of veth_xdp_rcv() (which processes a batch) and placed at the end of the veth_poll() function. This ensures netif_tx_wake_queue() is called once per complete NAPI poll cycle. (3) Finally, the NAPI completion check in veth_poll() is updated. If NAPI is about to complete (napi_complete_done), it now also checks if the peer TXQ is stopped. If the ring is empty but the peer TXQ is stopped, NAPI will reschedule itself. This prevents a new race where the producer stops the queue just as the consumer is finishing its poll, ensuring the wakeup is not missed. Fixes: dc82a33 ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops") Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Alex will send phylink patches soon which will make us link up on QEMU again, but for now let's hack up the link. Gives us a chance to add another QEMU NIC test to "HW" runners in the CI. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Let's see if this increases stability of timing-related results.. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
These are unlikely to matter for CI testing and they slow things down. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
tc_actions.sh keeps hanging the forwarding tests. sdf@: tdc & tdc-dbg started intermittenly failing around Sep 25th Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: NipaLocal <nipa@local>
We exclusively use headless VMs today, don't waste time compiling sound and GPU drivers. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
kmemleak auto scan could be a source of latency for the tests. We run a full scan after the tests manually, we don't need the autoscan thread to be enabled. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reusable PR for hooking netdev CI to BPF testing.